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ABSTRACT 

A total of 149 students enrolled in an undergraduate 
nursing research methods course participated in a study comparing 
three strategies for using formative evaluation (test feedback 
throughout a course) to predict students at risk of failure at 
stumnative evaluation (the final eKamination) , Students took 12 weekly 
multiple^choice quiz^mm, which were graded and returned for 
self^study^ and a final 60-^item multiple^choice eKame Three 4--week 
qui^ subtotals were the discriminating variables used to predict 
mambership in three final^exam score categories: Group 1 (poor); 
Group 2 (fair); Group 3 (gaed). S#pmrate discrimlniint anaiyipes tested 
three patterns of assigning prior probabilities of group membership: 
(1) equal (each «333); (2) proportional to actual nwobers of students 
in each group; (3) weighted by setting cost of misclassif ying poor 
students as three times more serious than cost of misclassif ying fair 
or good students. A significant discriminant function emerged, and 
confirming previous results, effect size (a standardized : lasure of 
the discrepancy between performance and the overall mean) for poor 
students decreased over time, showing that they were "closing the 
gap." Assigning probabilities proportional to cases gave best overall 
classification accuracy (53«02%), but Bayesiaii weighted adjustment 
best predicted students at risk of failure (82*1% correctly 
classified) while sacrificing some overall predictive power (42.95% 
correct)* (LPG) 
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BEST eOPY AVAILABLE 



Frequent tmmti ng wi th i nstr uctor'-*madB tests hms become a 
eommon practice in many college and uni versi t i esj, especially in 
courses such ats research design or statistics where the 
acquisition of a hierarchy of skills is required. In courses 
iAihere short quizzes are given during each class period,, a 
significant proportion of class time is given over to such 
testing during the semester- It is important that this practice 
be evaluated in terms of its utility in promoting learning? 
improving instructLion and identifying learning problems which may 
require intervention* 

The use of frequent qiviisses to monitor progress is an example 
of what Bloomii Hastings and Madaus (1971) have called f ormati ve 
evaluation* Formative evaluation entails the collection of 
relevant data to provide guidance for the learner, and indicate 
the need for modifications in teaching strategies* Used properly, 
it can improve the instructor's ability to meet individual needs- 
The major goal of formative evaluation is to provide feedback 
during the learning process on errors and misconceptions, rate of 
progress^ and achievement relative to an acceptable level of 
competence. Summati ve eva^uat i on . in contrasti, provides a general 
assessment of student achievement over an entire course or large 
unit and is usually the major determinant of course grades. 

According to Bloom, one potential use of formative evaluation * 
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data is in predicting the outcome o-f summative evaluatisn* Sinct 
there is usually cansiderable overlap between the tw© kinds q4 
assemsment in terms of cantent^ behaviors and testing procedures, 
the ti^Q kind^i a4 test results are likely to be high-ly correlated. 
Thus, it may be possible to predict performanoe on summative 
tests in advance^ and to alter the prediction for the better, 
Empirioal evidence suggests that students use data from formativt 
evaluation to modify their study habits and improve their 
performance over time. Wolfe <19S1) in a study of students in an 
undergraduate nursing research course, found that prediction of 
midterm eMamination scores from pre— midterm quisles was 
considerably more accurate than the prediotian of final 
examination scores from pre-final eHam qui^^es. She concluded 
that by the time half , a term had passed., students had learned to 
use the results of weekly qui^^es to ohange the prediction as 
Bloom suggested. In another study with the same population 
(Wolf eg 198^) weekly quia scores were summed over each of four 
three^week periods. Final ©nam scores were dichotomized as 
satisfactory (A or B) or unsat i^sf actory (C or below). 
Discriminant analyses revealed that 71% of the final eMamination 
scores were correctly classified as satisfactory or 
unsatisfactory on the basis of quis subtotal scores. EHamination 
of group differences on the individual discriminating variables 
showed that mean scores for the "satisfactory" group were higher 
on the first three quiz subtotals than those of the 
"unsatisfactory" group. However^ during the last quarter term 
this difference was reversed, with the "unsat i sf aotory " group 



achieving m lUghly Nigher mmmn quia sutmbtotal . This finding 
suggested thit ituiaej^ts having mar© gls»bal dif-ficulty with CQurse 
material,, asriflected in lower -final ami nation scores, may 

have tried sofniwhat N^rder to modify th*^ir study habits and 
improve their standing toward the end s^-f the termi, compared to 
their relativily secur^e ^lassmatemp 

In A third study CWolfe^ the ^eHtent tf3 which grades on 

a comprehensivi -final eHamination could be classi-fied am good <A 
or B) J, fair (C) or poor (D or F) on the basis o-f weekly quia 
scores was detirmined. Students in an ur — idergraduate nursing 
research courii were given 12 short weel^ly quiazes and a 
* comprehensive final e>i ami nation*, Three ^discriminant analyses wert 
per-formed with receded 'Pinai eKaminati or-=-i scores as the grouping 
variable. Discriminating variables for ^^he analyses consisted o-f 
the four quis icores ^rom the f ir^st^ m<^dle and last third of the 
semester^ respicti velx • Although quia s^=ores discriminated 
significantly bitween the groups for ea^^h time period (p<,OS)^ 
the percentagi of cases correctly classi fled decreased over time. 
The fact that final ej^amination grades b*ecame less predictable 
over time further supported Bloofn^'s con j ture that students 
may indeed modify thei r study habits and change the forecast. 

In using diicrimineint analysis to pr- edict group membership on 
the basis of a set of measurement s? an i individual is assigned to 
that group for which h^ or she has the highest posterior 
probability o^ fnembersHip. Most computer programs <e.g,^p, SPSS=-X^ 
offer seviral options for determina^ng the so-called prior ^ 
probability o-f group membership. The pri^sr probability of a given 



population is the prababilitv that an indi du^\ mm. -^t.s£ at 
random actually comes from that populatis ? -r* j^n z::mc^-, in a 
three^grQup discriminant itnalysis in whic'=' the grc ip^:}i . equal 
in si^e^ a strai ghtf arward assumption y^mL^ ^ -^m ' person 
selected at random has a prior probabil: ■: cj-r -rnird of being 
classified into any one of the grsups. Tfn#t i? . without knowing 
any of the individual's characteristics, ^re Equally likely to 

classify him or her as belonging to group 1^ 2 or 3. However,, m 
Bayesian adjustment of this prior probability may be advisable 
if the group sises dif-f^er widely or if the costs of 
miselassif ication Into certain groups are considered very highp 
For Instancej, in the case of students who are at risk of academic 
failure or poor performance, the cost of failing to. identify them 
early may be regarded as several times greater than the cost of 
miselassif ying students whose performance is satisf actory- 

The purpose of the present study was to compare the effects 
of three different procedures for specifying the prior 
probabilities of group membership on improving the ability to 
correctly classify fstudents at risk of failure in an 
undergraduate research methods course. 

METHODS 

Sample p One hundred forty nine students in five sections of an 
undergraduate course in nursing research methods participated in 
the study. All sections were taught by the investigator during 
1904 and 1985. ^ - 
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PrQCedure . Far each sectioni, 13 fnulti pi e^c ^hal quiagea with -Five 
to ten items were given ^ one mB^hmmk mft ^ar the first WiSk of 
class. Each quia covered con-tent Hhich h^d f been presented and 
reviewed during the previous Cl^i^i iessi Qn , Students eHchinged 
CQmpleted qui^^es i^ith their neiflhbers for grading, and thi 
correct answers were read in aimmby thm = instructor, Thiquiises 
were returned to the eH«mine^#^ ^nd each q«,iestion was dticussed 
in as much detail as needed^ F^lloi^ing r^v^a^ewi, quisses Win 
collected and grades recorded by thi i nstrt^ictor . The following 
weekj, quizzes were returned to S'cydints wiWth the suggestian ':hat 
they keep them to aid in revt ^i^tnHPi^ thm final eHami natian. The 
course content was the same in Psch secti^i— i , with the first half 
term devoted to descriptive anQ csrrtlatiof — lal statistics, 
measurement and research design? Hhile the second half dealt with 
fitatistical inference* On th© l^^tday of c — i mmm ^ comprehensi ve 
60--item multiple choice eHami n^^tjn wits gis^wen. Students werf 
invited to contact the i nstructtfh- to mrran^ae for individual 
conferences regarding their p^rfwrnce* 

RESULT^ 

For the purpose ©f stati st i c^l analy#i s ^ ^ quiz scores wire 
summed over each of three f oup-'^w%€^ peristis ^ , in order to inhance 
predictor reliability and ensure 4^iVorabl^ m ratio of subjicts to 
variables. Final eHaminAtion mcar^iwere i^e^-coded as folloHSi 
Group 1^ poor (41 or beloyy)^ SraM^S? fair < 42-47 ) p Group 3^ 
good (48-60) , 



Three s^^-tspwise discriminant analyses were per-Pormed on the 
datSj, with * the recoded -final eHaminatisn scores as the depender 
ineasure and the three -four — week quis subtotals as the 
dimcrimi nat^ 1 ng variablesi. For the first analysis the prior 
probabilitises were all assumed to equal 0.3333, For the second 
analysis;, th-^e prior probabilities were specified as the 
proportions of cases in each groups for Group 1^ 0*24i7| for 
Broup 2? 0.«^29ep for Group 3^ 0.30S7- For the third analysis^ a 
prQc^dure #L_iggested by Afifi and Clark <19S4) was followed. The 
investigator — considered it three times as serious to misclassif 
a poor stud^wnt as it was to misclassify a fair or good student, 
ThuSf the pr- — ©port ions of cases in groups 1^ 2 and 3 were 
(nultiplied t^oy 1 and 1^ respect i vel yi 
For Group is adjusted p^ = .2617 K 3 ^ .7S51 
F©r Sroup 2s adjusted p^ ^ .4295 X 1 - ,4295 
For Group 3? adjusted p^ ^ -3087 X 1 ^ ,3087 

Since the pr ior probabilities must sum tc? the probabilities 
computed abo^ ve were further adjusted by dividing each one by 
.7351 + ,459^S + ,3087 - 1.S233. Thus, the final values for the 
prior prob^b^a 1 i ties werei 
For Group li qj - -SIS 
For Group 2s ^ .232 

For group ^ .203 

The results ^^f the three analyses are shown in Table 1. 



Table 1 about here 
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One significant dim^riminAnt funGtiOnwas foun^ ( m 35,297^ d-f 
=4^ p - , 0000) - 

DISCUSSION 

The -Fact that qui ^ subtotal m di serifnindted significantly 
amang students classified as poor., f^ir or good on the basis of 
final eH«mination performance is in accard iNith risults obtained 
earlier by the same investigator. Enaniinition univariate 
F-ratios showed that the groups di f f erid mi gn i f i ctnt 1 y during all 
three periods (p<,OS). The F-ratios were consi dsrabl y larger 
during the first two periode CF ^ 1 1 - 7i ind F ^ 13.13, 
respectively > than during the last period <£ ^ 3,S2) , suggesting 
that as in the earlier studies^ the weakir stud»ntE may have 
attempted to close the gap between their perferrtance and ^hat of 
their stronger peers. This observation i^iai vali«ated by 
computing '^effect mises" for each group for eaehh time period (See 
Table 2)* ETfect mi^e was computed by iubtractirtg the grand mean 

Table 2 about here 



for each time period from the group mean, and dividing this 
difference by the total standard deviation for ail 149 cases. For- 
the '*poor'* groups the effect si^e a stindardissd measure of the 
discrepancy between their performance and the ovsrall mean - 
showed a small but consistent decrease from the beginning to the; 
end of the term. 



major interemt is the effect of adjusting the priar 
pPQbab» 1 1 ties on the pfercantage of mtudants at rimk of 
failures oorrectly classified. Table 1 shows that^ althQUgh the 
l^rges^^ overall percentage of cases correctly classified <S3.02%) 
was W^^^mi ned when the pri or probab i 1 i t i es were made 
propor^^i onal to group sises^ the "poor'* group, because of its 
r^latiw^el.y small slse^ was assigned the smallest prior 
probabL^ 1 ityp As a result ^ only 30^5% of the students in that 
greup is-jere correctly classified ^ nearly two-thirds would have 
been ir^Bcarrectly identified as '*fair*' or "good" performers. In 
contra^^t^ when the cost of misclassif ication of ppor performers 
wa# t#fcen into account and the prior probabilities adjusted 
accyrdi ngly^ 82.1% of the students in this group were correctly 
clasPi-f ied. However., the overall percentagfi of cases correctly 
clas#lf led was only 42.9S%. 

Cle^^^rly^ there is a tradeoff involved in the decision to 
weight iprior probabilities according to the perceived cost of 
ml scl 5is^si f i cat i on - The method selected must be guided by the 
purpome of the statistical analysis as well as the personal 
philpse^ohy of the investigator. One must weigh the potential harm 
dan© ta a good student who is mistakenly informed that he or she 
in ac^d^emic jeopardy against the possibly greater damage which 
would o^=cur if a student at risk of failure is not identified 
early mi — lOugh for effective intervention. 
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Table 1, Classification results ^or analyses 1,, 2 and 3 



Analysis 1 <priors equal) 

Actual group No, of cases Predi cted group membershi p 



Eroup 1 39 25 S 9 

64.1% 12,3% 23.1% 

Group 2 64 21 IS 2S 

32. S% 2Q. 1% 39. 1% 

Group 3 46 3 13 30 

5% 28. 3% 63. 2X 

Percent of grouped cases correctly classifiedi 48.99% 

An^l ysi m 2 (priors proportional to group sise) 

Actual group No. of eases Predicted group membership 



Sroup 1 39 15 20 4 

3S . 5% 51, 3% 1 0 . 3% 

Group 2 64 11 40 13 

17.2% 62 . S% 20 , 3% 

Group 3 46 2 20 24 

4. 3% 43. 5% 32. 2% 

Percent of grouped cases correctly classifieds 53.02% 
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Table 1 (continued) 



Analy sis (priors weighted by cQBt of mi scl assi f i cat i on 



n^ 



Actual group Nq, of cases Predicted group membership 



Group 1 39 32 3 4 

Group 2 64 42 9 13 

65.6% 14,1% 20.3% 

Group 3 46 IS 5 23 

39. 1% 10. 9% SO. 0% 

Percent of grouped caBes oorrectly classified? 42.95% 
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Table 2, E-ffect sizes for the three groups at each time periodp 



ist period 2nd period 3rd period 

Sroup 1 --0-553 -0^494 -0«332 

Group 2 0.02S -0-076 0-OlS 

Group 3 0-430 0*S2S 0^257 
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